Add implementation of Kafka 0.11 Records #973

wladh · 2017-10-26T09:55:18Z

This PR introduces implementation of the new Record and RecordBatch formats from Kafka 0.11. Also it introduces an union type to deal with request/responses that can contain either records or messages.
Issue #901

Kafka supports nullable arrays, and their null value is represented by legnth of -1.

Kafka 0.11 introduces a new Record format that replaces Message from the previous versions. The new format allows for Headers which are key-value pairs of application metadata associated with each message.

Kafka 0.11 introduced RecordBatch as a successor to MessageSet. Using the new RecordBatch is required for transactions and idempotent message delivery.

Many request/response structures can contain either RecordBatches or MessageSets depending on the version of Kafka the client is talking to. This changeset implements a sum type that makes it more convenient to work with these structures by abstracting away the type of the records.

eapache

This is mostly pretty straightforward, just a couple of things.

My biggest concern is around the varint length calculations. I understand the problem you're facing with a dynamic reserve length for the encoder, but I find the solution itself a bit hard to follow. It also results in prep-encoding the majority of the record-set a couple more times than is strictly needed.

For the prep-encoder, the actual reserve length doesn't matter at all; there should be no need to recurse because counting up the lengths is just addition so the order doesn't matter.

For the real-encoder, the prep-encoder has already run so you should be able to know the appropriate reserve length at that point?

eapache · 2017-10-26T13:20:15Z

real_decoder.go

@@ -79,7 +79,7 @@ func (rd *realDecoder) getArrayLength() (int, error) {
 		rd.off = len(rd.raw)
 		return -1, ErrInsufficientData
 	}
-	tmp := int(binary.BigEndian.Uint32(rd.raw[rd.off:]))
+	tmp := int(int32(binary.BigEndian.Uint32(rd.raw[rd.off:])))


Why cast to int32 and then int?

Because we want to convert it to a signed value, so we're converting it to its signed counterpart (int usually 64 bits on 64bit platforms, so the sign conversion won't happen).

eapache · 2017-10-26T13:20:50Z

record.go

+	controlMask = 0x20
+)
+
+type Header struct {


should we call this RecordHeader to be a bit clearer?

eapache · 2017-10-26T13:42:09Z

records.go

+	return 0, fmt.Errorf("unknown records type: %v", r.recordsType)
+}
+
+func (r *Records) isPartial() (bool, error) {


this method (and isControl below) don't have any coverage AFAICT

Added some coverage for them.

wladh · 2017-10-26T16:04:32Z

Indeed the uncompressed records will be prep-encoded twice. But I didn't find a clean solution (that doesn't involve checking in encode() whether the encoder is real or prep).
I could make varintLengthField also a pushEncoder and keep it around in Record between the encoders runs.

eapache · 2017-10-26T18:00:05Z

I could make varintLengthField also a pushEncoder and keep it around in Record between the encoders runs.

I think that sounds like it might be simpler? If pushEncoder.run could also return additional length then I think this becomes very nice?

wladh · 2017-10-27T10:31:15Z

Actually looking again at the code that wouldn't work without a few more changes. pushEncoder.run is not called by prepEncoder.pop because the run methods are actually writing the field. Maybe we can define a pushEncoder.prepRun method that will be called by prepEncoder.pop but it's not very appealing to me because we're re-introducing the knowledge of encoding phase into the encoders that shouldn't care about it. That was one thing I tried to avoid in my solution (and it should work even if we do away with prepEncoder for instance and use vectored I/O or builders).

eapache · 2017-10-27T14:38:21Z

Hmm, ya. What if reserveLength() was allowed to return a special const dynamicLength = -1? Then the prepEncoder could just call it again in pop if it was flagged as dynamic during the push? Or even a separate isDynamic() bool method on the interface or something...

Added dynamicPushEncoder interface that extends the pushEncoder with an adjustLength method that will be called by prepEncoder.pop() time so that it computes the actual length of the field. Also made varintLengthField implement this method so we can avoid a needless run of prepEncoder for uncompressed records.

wladh · 2017-10-30T14:25:08Z

So I added this additional interface dynamicPushEncoder that defines an additional method that will be called during prepEncoder.pop() to get the actual field length. I'm not thrilled with this solution as it implies a prep phase, but at least the knowledge of it is limited to the varintLengthField.

eapache

I like this approach better, thanks. Just some minor stuff now.

eapache · 2017-10-30T18:50:03Z

length_field.go

+}
+
+func (l *varintLengthField) run(curOffset int, buf []byte) error {
+	if !l.adjusted {


I don't think this is necessary, it will already fail in a pretty obvious way if this mistake is made.

I can remove it, but I think the failure won't be too obvious (it will have length 0, which could happen in other scenarios as well).

eapache · 2017-10-30T18:51:55Z

length_field.go

@@ -31,22 +31,43 @@ func (l *lengthField) check(curOffset int, buf []byte) error {
 type varintLengthField struct {
 	startOffset int
 	length      int64
+	adjusted    bool
+	size        int


Having both size and length fields is going to cause confusion. Isn't size always calculable? Should we just put that logic in reserveLength() and call it as needed?

size could always be calculated, but it involves encoding the varint into a buffer, so not very cheap.

It looks pretty cheap to me? The buffer doesn't escape so it will be allocated on the stack, and the actual encoding itself is just a couple of bit-ops in a very short for-loop.

eapache · 2017-10-30T18:53:07Z

packet_encoder.go

+	pushEncoder
+
+	// Called during pop() to adjust the length of the field.
+	adjustLength(currOffset int) int


I'd maybe call this updateLength but it doesn't matter. The description should mention that the return value is the diff though, not the new length.

Actually does it need to return anything? The caller can always just call reserveLength again, and subtract the old reserveLength before-hand if it wants the diff?

That will complicate the caller, since in practice most of the time they will call reserveLength at push time and then again at pop time just to readjust.
The initial call to reserveLength might not return 0 (especially if we go with your suggestion of computing the size on demand). So, most of the time at the adjust time you want the diff.
I will update the comment.

eapache · 2017-10-30T18:57:03Z

record.go

+	Headers        []*RecordHeader
+
+	length      varintLengthField
+	totalLength int


I believe this is unused now.

eapache · 2017-10-30T18:59:05Z

record.go

+			return 0, err
+		}
+	}
+	return int(r.length.length) + r.length.size, nil


It would be simpler and more reliable to just always prep-encode and then ask the encoder. Otherwise this is going to bite people who e.g. add a header or something.

eapache · 2017-10-30T19:12:25Z

record_batch.go

+	case CompressionNone:
+		re = pe
+	case CompressionGZIP, CompressionLZ4, CompressionSnappy:
+		if err := b.computeRecordsLength(); err != nil {


This block here is doing basically the same thing as the global encode(encoder) ([]byte, err) method, i.e. using a prep-encoder to calculate the length and then making a byte array and using a real-encoder to encode it?

You're right, I can reuse that.

wladh · 2017-10-31T14:07:12Z

It seems that travis-ci failed because one broker didn't come up. Not sure how to retrigger the check.

eapache · 2017-10-31T14:10:03Z

I can retrigger CI as needed, I've never been able to figure out how to let contributors do that without giving them full access to everything.

eapache · 2017-10-31T14:11:42Z

packet_encoder.go

@@ -50,3 +50,14 @@ type pushEncoder interface {
 	// of data to the saved offset, based on the data between the saved offset and curOffset.
 	run(curOffset int, buf []byte) error
 }
+
+// dynamicPushEncoder extends the interface of pushEncoder for uses cases where the length of the
+// fields itself is unknown until its value was computed (for instance varint encoded lenght


typo s/lenght/length

eapache · 2017-10-31T14:14:34Z

record_batch.go

+		return pe.pop()
+	}
+
+	var raw []byte


if you move this whole block (down to line ~126) into a method, and invert the check on b.compressedRecords you can get rid of lines 77-85 which are duplicates of 127-135.

wladh · 2017-10-31T15:07:32Z

Maybe I should add a dynamicPushDecoder that will call a decodeField method during push and skip reserveLen.

dynamicPushDecoder extends pushDecoder for cases when the field has variable length. Also, changed varintLengthField to make use of the new interface/

eapache · 2017-10-31T18:04:42Z

This is really nice, thanks!

vlad-arista added 4 commits October 26, 2017 10:51

Allow negative values for getArrayLength()

82babd0

Kafka supports nullable arrays, and their null value is represented by legnth of -1.

Add support for Kafka 0.11 Record format.

f554f21

Kafka 0.11 introduces a new Record format that replaces Message from the previous versions. The new format allows for Headers which are key-value pairs of application metadata associated with each message.

Add support for Kafka 0.11 RecordBatch

7630f80

Kafka 0.11 introduced RecordBatch as a successor to MessageSet. Using the new RecordBatch is required for transactions and idempotent message delivery.

eapache reviewed Oct 26, 2017

View reviewed changes

vlad-arista added 2 commits October 26, 2017 16:55

Rename Header to RecordHeader

61fb33e

Add test coverage for Records.isControl() and Records.isPartial()

6d52b99

eapache reviewed Oct 30, 2017

View reviewed changes

vlad-arista added 4 commits October 31, 2017 11:50

Clarify the expected return value of ajustLength

371165d

Don't cache the length

135aca9

Make an encoder/decoder for records array

b51e231

Simplify uncompressed records length computation

5ffc7bf

eapache reviewed Oct 31, 2017

View reviewed changes

vlad-arista added 5 commits October 31, 2017 14:43

Get rid of size and adjusted fields in varintLengthField

4a9a2bc

Fix typo

c302872

Refactor record batch encoding

bd304f1

varintLengthField.reserveLen() should return 0 during decode

033fed9

Check records encoding error

808ea14

Add dynamicPushDecoder interface

ff1f79c

dynamicPushDecoder extends pushDecoder for cases when the field has variable length. Also, changed varintLengthField to make use of the new interface/

eapache merged commit eca6c1c into IBM:master Oct 31, 2017

wladh deleted the records branch October 31, 2017 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add implementation of Kafka 0.11 Records #973

Add implementation of Kafka 0.11 Records #973

wladh commented Oct 26, 2017

eapache left a comment

eapache Oct 26, 2017

wladh Oct 26, 2017

eapache Oct 26, 2017

wladh Oct 26, 2017

eapache Oct 26, 2017

wladh Oct 26, 2017

wladh commented Oct 26, 2017

eapache commented Oct 26, 2017

wladh commented Oct 27, 2017

eapache commented Oct 27, 2017

wladh commented Oct 30, 2017

eapache left a comment

eapache Oct 30, 2017

wladh Oct 31, 2017

eapache Oct 30, 2017 •

edited

Loading

wladh Oct 31, 2017

eapache Oct 31, 2017

eapache Oct 30, 2017

wladh Oct 31, 2017

eapache Oct 30, 2017

eapache Oct 30, 2017

eapache Oct 30, 2017

wladh Oct 31, 2017

wladh commented Oct 31, 2017

eapache commented Oct 31, 2017

eapache Oct 31, 2017

eapache Oct 31, 2017

wladh commented Oct 31, 2017

eapache commented Oct 31, 2017

Add implementation of Kafka 0.11 Records #973

Add implementation of Kafka 0.11 Records #973

Conversation

wladh commented Oct 26, 2017

eapache left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wladh commented Oct 26, 2017

eapache commented Oct 26, 2017

wladh commented Oct 27, 2017

eapache commented Oct 27, 2017

wladh commented Oct 30, 2017

eapache left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eapache Oct 30, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wladh commented Oct 31, 2017

eapache commented Oct 31, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wladh commented Oct 31, 2017

eapache commented Oct 31, 2017

eapache Oct 30, 2017 •

edited

Loading